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A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) FROM 
THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1.136(a). In no event, however, may a reply be timely filed 
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DETAILED ACTION 
Response to Amendment 

1 . Applicant's amendment to the specification to remove embedded links is 
acknowledged. However, applicant has not removed embedded hyperlinks at page 2, 
lines 5 and 14. 

2. Applicant's amendment to claim 10 is acknowledged. Consequently, rejection to 
claim 10 under 35 U.S.C. 112 second paragraph is withdrawn. 

3. Applicant's arguments regarding claims 1-42 have been fully considered but they 
are moot in view of the new grounds of rejection presented in this Office Action. It is 
also noted that Applicant argues that "the removal of unimportant words in Aiken is a 
syntactic technique as opposed to the filtering disclosed in applicant's disclosure which 
is a semantic technique". Note that the claimed "semantic filtering" merely appears in 
newly added claim 43. Note also that claim 1 does not require any semantic filtering. 
Claim 8 which depends from claims 1 and 2 even recites "the step of filtering further 
comprises removing from the token stream at least one token corresponding to a stop 
word" which is clearly a syntactic filtering technique. 

Specification 

4. The incorporation of essential material in the specification by reference to 
publications is improper. Applicant is required to amend the disclosure to include the 
material incorporated by reference. The amendment must be accompanied by an 
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affidavit or declaration executed by the applicant, or a practitioner representing the 
applicant, stating that the amendatory material consists of the same material 
incorporated by reference in the referencing application. See In re Hawkins, 486 F.2d 
569, 179 USPQ 157 (CCPA 1973); In re Hawkins, 486 F.2d 579, 179 USPQ 163 (CCPA 
1973); and In re Hawkins, 486 F.2d 577, 179 USPQ 167 (CCPA 1973). 

5. The disclosure is further objected to because of the following informalities: 
applicant has not removed all embedded hyperlinks at page 2. 

Appropriate correction is required. 

Claim Rejections - 35 USC §112 
The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
nnaking and using it, in such full, clear, concise, and exact temris as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention, 

6. Claims 43 is rejected under 35 U.S.C. 1 12, first paragraph, as containing subject 
matter which was not described in the specification in such a way as to enable one 
skilled in the art to which it pertains, or with which it is most nearly connected, to make 
and/or use the invention. The specification does not provide support for the claimed 
"semantic filtering" recited at claim 43. 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 
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7. Claim 43 is rejected under 35 U.S.C. 112, second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention because it is not clear how the "semantic filtering" is performed. 

The art rejection of claim 43 is applied as best understood in light of the rejection 
under 35 U.S.C. 112, first and second paragraphs discussed above. 

Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

8. Claims 1-6, 8, 10-14, 19-21. 23-43 are rejected under 35 U.S.C. 102(a), (e) as 
being anticipated by Aiken (US 6,240,409). 

Regarding claim 1 , Aiken discloses a method for detecting similar documents 
including all the claimed subject matter (see Figures la, b, column 3 lines 44-47). The 
claimed document is met by the string including sub-strings in the method of Aiken (see 
column 4, lines 38-53). Note the step of obtaining a document 102, filtering the 
document 106. Aiken discloses the step of determining a document identifier for the 
filtered document and a hash value for the filtered document when Aiken shows that the 
sub-string is associated with a position and a hash value pair (see Figure 2). The 
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claimed step of generating a tuple for the filtered document is met by the fact that a 
hash value and position pair is created and stored (see step 114, column 6, lines 7-28). 
Since a string is composed of sub-strings, the hash values computed for sub-strings are 
clearly computed for documents as claimed. The tuple is clearly compared with a 
plurality of tuples for the method of Aiken to detect similar strings. Aiken discloses 
detecting if the document is similar to another document by determining if the tuple is 
clustered with another tuple in the document storage stnjctured (see Figures 4a, 4b. 4c. 
column 7, lines 25-34, column 10, line 4- column 12, line 2). 

Regarding claim 2, Aiken discloses parsing and filtering the document when 
Aiken shows removing unimportant words (see column 4, lines 54-67). Clearly the 
filtered document comprises a token stream of a plurality of tokens as claimed. 

Regarding claim 3, Aiken discloses retaining a token according to at least a token 
threshold when Aiken shows that common or frequent words are removed. A threshold 
has to be present for the method of Aiken to detemnine common and frequent words 
(see column 4, lines 38-53). 

Claim 4 merely reads on the translated document in the method of Aiken after all 
common words and frequent words have been removed. 

Regarding claim 5, Aiken discloses determining the hash value for the filtered 
document by processing individually each retained token in the token stream when 
Aiken show processing sub-strings and determining their hash values (see column 6, 
lines 7-28, column 9, lines 24-26). 
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Regarding claim 6, Aiken discloses determining a score for each token in the 
token stream and comparing the score for each token to a first token threshold when 
Aiken shows that common or frequent words are removed. A threshold has to be 
present for the method of Aiken to determine common and frequent words (see column 
4, lines 38-53). The token stream is clearly modified by removing each token having a 
score not satisfying the first token threshold and retaining each token having a score 
satisfying the first token threshold as claimed since the common words and frequent 
words are removed in the method of Aiken. 

Regarding claim 8, Aiken discloses filtering by removing from the token stream at 
least one token corresponding to a stop word (see column 4, lines 57-58, column 8, line 
67- column 9. line 3). 

Regarding claim 10, Aiken discloses removing a token from a token stream 
based on collection statistics and at least one token threshold when Aiken shows that 
the method remove words of "the" "and" , "this", "is" (see column 4, lines 57-58. column 
8, line 67- column 9, line 3). A token threshold has to be present for the method of Aiken 
to determine common and frequent words (see column 4. lines 38-53). 

Regarding claim 1 1 , Aiken discloses removing a token from a token stream (see 
column 4, lines 57-58, column 8, line 67- column 9, line 3). 

Regarding claim 12. Aiken discloses removing fomnatting from the document 
(see column 4, lines 55-57). 

Regarding claims 13, 14, clearly the method of Aiken uses collection statistics 
pertaining to a plurality of documents for filtering the document since the input file is 
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compared to a set of collected files to detect similarity (see column 2, lines 47-51 ). The 
collection statistics have to be present for the collected documents to be clustered as 
shown in the method of Aiken (see Figure 4c, column 1 1 , line 47- column 12, line 2). 

Regarding claim 19, Aiken discloses a hash table (see column 12, lines 40-44). 

Regarding claim 20, Aiken discloses that the document storage structure 
comprises a tree (see column 8, lines 30-38). 

Regarding claim 21, Aiken discloses that the tree comprises a binary tree (see 
column 8, lines 36-38). 

Regarding claim 23, Aiken discloses a hash table and at least one tree (see 
column 5, lines 33-40, column 8, lines 30-38). 

Regarding claim 24, Aiken discloses inserting the tuple into the document 
storage structure (see Figure la, lb, 4a, 4b, 4c). 

Regarding claim 25, the hash table of Aiken clearly comprises a plurality of bins 
of tuples as claimed and the step of detemnining if the tuple is clustered with another 
tuple clearly comprise determining if the tuple is co-located with another tuple at a bin of 
a hash table (see Figures 1 . 2, 4c, column 7, line 46- column 8, line 33). 

Regarding claim 26, Aiken discloses a tree comprising a plurality of branches, 
each bucket of the tree comprising at least one tuple and wherein the step of 
determining if the tuple is clustered with another tuple clearly comprise determining if 
the tuple is co-located with another tuple in a bucket of the tree (see column 8, lines 31- 
54, Figure 4c). 
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Claims 27, 29 correspond to a system to perform the method of claim 1 , thus are 
rejected for the same reasons stated in claim 1 above. 

Claim 28 corresponds to a computer program product to perform the method of 
claim 1 , thus is rejected for the same reasons stated in claim 1 above. 

Claim 30 is a mere combination of claims 1-4, 26, thus is rejected for the same 
reasons stated in claims 1-4, 26 above. 

Claims 31, 33 correspond to a system to perform the method of claim 30, thus 
are rejected for the same reasons stated in claim 30 above. 

Claim 32 corresponds to a computer program product to perform the method of 
claim 30. thus is rejected for the same reasons stated in claim 30 above. 

Regarding claim 34, Aiken discloses all the claimed subject matter including 
determining a hash value for a document (see Figure 1 , column 4, line 17- column 7, 
line 45, column 9. lines 16-30). accessing a document storage structure comprising a 
plurality of hash values, each hash value representing one of a plurality of documents 
(see Figure 4a, column 10, line 4- column 11, line 46), determining if the hash value is 
equivalent to another hash value in the document storage stnjcture (see Figure 4c, 
column 11, line 47- column 12, line 2). Note the document is met by the strings 
including sub-strings in the method of Aiken (see column 4, lines 38-53). 

Claims 35, 37 correspond to a system to perform the method of claim 30, thus 
are rejected for the same reasons stated in claim 34 above. 

Claim 36 corresponds to a computer program product to perform the method of 
claim 30, thus is rejected for the same reasons stated in claim 34 above. 
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Regarding claim 38, Aiken discloses a method for detecting similar documents 
including comparing a document to a plurality of documents in a document collection 
using a hash algorithm and collection statistics to detect if the document is similar to any 
of the documents in the document collection (see Figures la, b, 4a, 4b, 4c). The 
claimed collection statistics have to be present and used in the method of Aiken for the 
similar documents to be clustered (see Figure 4c, column 1 1, line 47- column 12, line 

2)- 

Regarding claim 39, clearly the collection statistics pertain to the document 
collection since the statistics are used in clustering the collected documents (see Figure 
4c). 

Claims 40, 42 correspond to a system to perform the method of claim 38, thus 
are rejected for the same reasons stated in claim 38 above. 

Claim 41 corresponds to a computer program product to perform the method of 
claim 38, thus is rejected for the same reasons stated in claim 38 above. 

Regarding claim 43, Aiken discloses the step of performing semantic filtering on 
the document when Aiken shows that each string is translated according to rules 
tailored to the type of document being translated such as the syntax and semantic of a 
particular programming language (see columnn4. lines 47-53). 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
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invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

9. Claims 7, 9, 15-18, 22 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Aiken (US 6.240,409). 

Regarding claim 7, although Aiken does not specifically show the step of 
comparing the score for each retained token to a second token threshold and modifying 
the token stream as claimed, Aiken explicitly show that not every substring's hash value 
is stored (see column 6, lines 29-30). Therefore, it would have been, obvious to one of 
ordinary skill in the art to include the claimed feature while implementing the method 
taught by Aiken in order to further filter the document and save memory. 

Regarding claim 9, although Aiken does not explicitly disclose filtering by 
removing a duplicate of another token in the token stream, it would have been obvious 
to one of ordinary skill in the art to include such a feature in order to avoid processing 
redundant token, thus saving time and resources. 

Regarding claims 15-18, although Aiken does not explicitly show that the method 
uses specific hash algorithms as claimed, it is notoriously well known in the art to use 
different hash algorithms depending on users' requirements. Therefore, it would have 
been obvious to one of ordinary skill in the art to include all the claimed features while 
implementing the method of Aiken in order to suit users' needs. 

Regarding claim 22. although Aiken does not explicitly show that the binary tree 
is balanced, it would have been obvious to one of ordinary skill in the art to include such 
a feature while implementing the method of Aiken in order to store data efficiently and to 
facilitate searching and localization. 
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Conclusion 

10. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Aiken et al (US 6.493,709) teach a method for digitally shredding similar documents 
within large document sets in a data processing environment. 



1 1 . Any inquiry conceming this communication or earlier communications from the 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Safet Metjahic can be reached on 703-308-1436. The fax phone numbers 
for the organization where this application or proceeding is assigned are 703-746-7239 
for regular communications and 703-746-7238 for After Final communications. 

Any inquiry of a general nature or relating to the status of this application or 
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3900. 
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